Method and system for managing metadata

ABSTRACT

A computer-based method and scoring system for management of metadata is provided.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation claims the benefit of U.S. application Ser. No. 14/048,280, filed on Oct. 8, 2013, entitled METHOD AND SYSTEM FOR MANAGING METADATA, which claims the benefit of priority to U.S. Provisional Application Ser. No. 61/710,882, filed Oct. 8, 2012, entitled METHOD AND SYSTEM FOR MANAGING METADATA, All of which are expressly incorporated by reference herein in their entireties.

FIELD OF THE INVENTION

The present invention provides a computer-based method for the management of metadata wherein metadata is entered, scored by comparison to one or more standards or rules and a report of errors, warnings and recommendations is produced. The computer based method further provides a means for correction and/or editing and improvement of the metadata, as well as, the distribution of the improved metadata. Also provided by the present invention is a scoring system for the quality of the metadata.

BACKGROUND OF THE INVENTION

Metadata or metacontent provides information about one 20 or more aspects of data including, but not limited to, means of creation of the data, purpose of the data, time and date of creation, creator or author of data, and standards used.

Many organizations such as the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO, the Book Industry Study Group (BISG), EDiTEUR (define) have established extensive detailed rules and/or standards for metadata and registries in various disciplines. These standards often regulate .xml standard computer code that is used for the sharing of catalogue information between parties in fields where correct standardized catalogs are important, such as books, art, images, music and film.

Managers of such catalogs attempt to keep hundreds, thousands, and sometimes hundreds of thousands of individual records complete, of the highest quality, as determined, for example, by searchability, relevance and richness, and in compliance with such standards.

Computer-based methods exist for evaluating and/or verifying whether or not necessary tags of a file containing .xml metadata are completed, or are in the correct order or have the right codes for specific tags.

However, evaluating whether or not the provided metadata in the fields meets the established rules and/or standards and/or best practices for the selected discipline with respect to accuracy and/or format and/or quality, and/or relevance and/or completeness requires review by a human with advanced expertise in the rules and/or standards and/or best practices and/or quality of metadata for the selected discipline. Such human review is extremely time-consuming and laborious and is often inaccurate. Further, it is not possible for a human or a group of humans to maintain the huge number of records in these catalogs, often kept in different places and/or databases, in conformance with very nuanced and specific industry standards and/or other qualitative standards. Currently, such review is performed by random selection of catalogue entries for human review. This method provides neither an accurate representation of the quality of the metadata nor a cost-effective means for those who need to maintain accuracy and consistency within large catalogs.

SUMMARY OF THE INVENTION

An aspect of the present invention relates to a computer-based method for managing metadata. In this method, metadata for a selected discipline is uploaded to a computer processor. The metadata file is first checked via the computer for completion of the necessary fields. The uploaded metadata is then compared via the computer to one or more selected rules, standards and/or best practices for the selected discipline for accuracy and/or format and/or completeness and/or quality and a breakdown of errors and/or warnings from these comparisons is provided. Errors identified in these comparison steps by the computer are inclusive of missing fields in the data file as well as errors in metadata previously only identified visually by experts with years of experience in the rules and/or standards and/or best practices regarding metadata for that selected discipline. A means to correct and/or edit and/or enhance and/or improve the metadata is then provided.

In one embodiment, one or more scores are assigned to the metadata based upon completion of the necessary fields, the comparison of the entered metadata to the selected one or more rules, standards and/or best practices of the selected discipline and/or the quality of the metadata.

Accordingly, another aspect of the present invention relates to a scoring system for metadata by which users and/or consumers of such data can assess accuracy and/or reliability and/or completeness and/or richness and/or quality of the metadata. In this system, a computer processor is provided for entry of the metadata file for a selected discipline. A means is provided for checking the metadata file for completion of the necessary fields. A means is also provided for comparison of the metadata to one or more selected rules, standards and/or best practices for the selected discipline. Errors identified by this comparison means are inclusive of errors in metadata identified previously only by visual inspection by experts with years of experience in the rules and/or standards and/or best practices and/or quality regarding metadata for that selected discipline. A score card is generated and one or more scores indicative of the quality and/or accuracy and/or completeness and/or richness of the metadata is assigned to the metadata based upon the comparisons.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a block diagram depicting flow of information in the method of the present invention.

FIGS. 2A through 2C are examples of scorecards generated with the methods and scoring system of the present invention.

FIGS. 3A-3H provide nonlimiting examples of several industry standards or rules, the qualifier for the standard or rule, if required, and the logic for comparison of the metadata with the rule or standard.

FIG. 4 provides an example of an .xml data file containing an error, its identification via the method of the present invention and the .xml data file following correction of the error.

FIGS. 5A though 5E are screen shots from the method of the present invention exemplifying the process of identifying and correcting an error in the metadata. FIG. 5A is a screen shot of a list of rules bit followed and indicates that there is a title that is missing the necessary age qualification. FIG. 5B is a screen shot showing the book page of the title with an issue. In this example, this title has several issues and all are highlighted. FIG. 5C is a screen shot showing the book page following correction where the title has had the U.S. School Grades added. FIG. 5D is a screen shot showing the data manager system where the necessary information can be added and FIG. 5E is a screen shot showing the data manager adding the school grades.

DETAILED DESCRIPTION OF THE INVENTION

The present invention provides a computer-based method for management of metadata as well as a scoring system for metadata. With the method of the present invention, a metadata file for a selected discipline can be uploaded, evaluated, scored, corrected, and maintained.

In one embodiment, the computer-based method is provided with a user-friendly cloud-based environment accessible via any web browser. Alternatively, the computer based method may be hosted on a user's own network. The method of the present invention can perform on a single-user platform or a multi-user platform enabling multiple users to evaluate and correct, maintain and/or enhance metadata from a website collaboratively.

Examples of the various selected disciplines for which metadata can be managed in accordance with the present invention include, but are not limited to, metadata for books, images, film, music, art, and cultural collections.

FIG. 1 shows the flow of information in the computer-based method of the present invention.

In the method of the present invention, metadata for a selected discipline is first uploaded to a computer processor. See step 101 of FIG. 1 . The data can be uploaded in various manners. In one nonlimiting embodiment, the data is uploaded as an .xml file. In one nonlimiting embodiment, data is uploaded in a spreadsheet format such as via an EXCEL file.

In one nonlimiting embodiment of the present invention, the metadata uploaded, scored, corrected, and/or maintained, is for books. Book metadata, when encoded correctly helps book sellers, data providers, distributors, search engines and consumers find and access information about books. It also helps with inventory tracking, etc. This metadata includes, but is in no way limited to, book formats and prices, titles, publisher/imprint/brand name videos and other multimedia, descriptions and identifiers about the book. ONIX or online information exchange, an .XML-based standardized format for transmitting information electronically, is an example of a standard metadata format for the publishing industry. Accordingly, in one embodiment of the present invention, the computer-based method of the present invention may be designed to export .XML-based ONIX data uploaded by the user.

As will be understood by the skilled artisan upon reading this disclosure, however, the computer method and scoring system described herein are routinely adaptable to other selected disciplines involving metadata, especially those schemas based upon an .xml standard. In the method of the present invention, the metadata file is first checked for completeness of the necessary data fields. For example, in an embodiment involving an .xml file, the .xml file is checked for completion of the tag, schemes, etc. See step 102 of FIG. 1 . Also see FIG. 4 providing a nonlimiting example of an .xml data file containing an error, specifically missing data, its identification via the method of the present invention, and the .xml data file following correction of this error in accordance with step 102 of the present invention.

In the method of the present invention, the uploaded metadata is then further compared to one or more selected rules, standard and/or best practices for the selected discipline. See step 103 of FIG. 1 . Rules, standards and/or best practices, for purposes of the present invention, may be from the consumer and/or from the industry and/or be a quality standard or assess quality metrics and/or be based upon recommendations, suggestions and/or proposals by the inventors. For example, for metadata relating to books, the metadata may be compared to one or more ONIX industry standards and/or one or more best practices including, but in no way limited to, those set forth by the Book Industry Group and/or Editeur.org and those recommendations, proposals and/or suggestions of the inventor's herein. In one nonlimiting embodiment, recommendations, proposals and/or suggestions of the inventor's herein may enhance metadata quality. In another nonlimiting embodiment recommendations, proposals and/or suggestions of the inventor's herein may enhance discoverability, searchability, saleability and/or profitability of items in the catalogue. In accordance with the method of the present invention, rules have been created by which the metadata is compared. These rules may comprise a single entity's best practices set forth by an industry standard setting body or selected sets of rules based on a combination of industry best practices, recommendations, suggestions and/or proposals as determined by the inventors and/or users. In one embodiment, the comparison may be based upon a set of rules developed by the inventors herein from analyzing other data sets. Rules are also written based on logic for the user. These rules not only check the .xml but within the .xml. For this step, the user may select the one or more consumer or industry rules or standards and/or best practices for which they want their metadata compared. The method may provide listings of consumer or industry rules or standards and/or best practices which the user may select from for comparison. Alternatively, or in addition, the user may select their own listing of standards, rules and/or best practices to which they compare their metadata. In one embodiment, the user may select rules to generate a particular scorecard, nonlimiting examples of which are depicted in FIG. 2A-2C. The user may select more than one listing for comparison and generate more than one error and/or warning list and more than one score for their metadata. In this step of the computer-based method of the present invention, errors are identified via algorithms written by the inventors for the present invention which, prior to the invention, required visual review by experts with years of experience in the rules, standards and best practices regarding metadata of the selected discipline. Such human review is both time-consuming and laborious as the rules, standards and best practices for metadata in a selected discipline are extensive and complex, differing between different countries and/or governing bodies and/or industries. In addition, the rules, standards, and best practices are frequently changing and new rules are being added to keep up with changes in technology and/or to improve discoverability, searchability, saleability and/or profitability of catalogue items within a given industry.

A nonlimiting example of an error identified at this step is the error of the Title being written in all capital letters and/or containing the words “volume 27” when there is a separate field for volume as part of ONIX. As another nonlimiting example, at this step, the error of a French-based publisher presenting a book title in any other format than capitalization of only the first word and proper nouns, or the error of a U.S. publisher presenting a book title in a format where only the first word is capitalized instead of in headline-case are identified. As another nonlimiting example, at this step, the error of having only an English language description on book that is sold is non-English speaking countries is identified. As another non-limiting example, if a user's digital rights to a book include all the countries of the world and the user is selling digital rights only based on the publisher's the home currency, this error is identified. Prior to the instant invention, such subtle, yet significant, errors could only be identified through visual inspection of randomly selected files by an expert in the metadata rules, standards and/or best practices for the selected discipline.

Nonlimiting examples of several industry standards or rules, the qualifier for the standard or rule, if required, 5 and the logic for comparison of the metadata with the rule or standard are set forth in FIGS. 3A-3H. The method of the present invention comprises multiple similar algorithms for other standards and rules, as well as inventor recommendations, proposals and/or suggestions for use in the 10 instant invention.

A breakdown of errors and warnings from the comparison is then generated and provided to the user. See step 104 of FIG. 1 . By errors, it is meant problems that must be fixed for the file to be compliant with industry standards. By warnings, it is meant issues that may cause problems, or in fact be errors but must be checked to see whether a user wants to make an exception for them.

For example, for book metadata, the metadata may be scanned against all the rules for a selected scorecard, nonlimiting examples of which are depicted in FIGS. 2A-2C. The user then sees a list of results, listed by priority or type and/or data field. The user can then review the list of titles with this error or data resulting in a warning, and then they can apply a global or selected update or correction to all or selected titles which is immediately updated in the metadata. They can also review, correct and update an individual title after reviewing all that title's metadata with the errors and/or warnings highlighted. Thus, data errors and/or data resulting in a warning can be corrected automatically or on a case-by-case basis by user selection of the data errors and data resulting in warnings to be corrected.

Accordingly, the computer-based method also provides a means to correct and/or edit and/or enhance and/or improve and/or keep current the metadata. See step 105 of FIG. 1 . Also see FIGS. 5A through 5E providing screen shots from the method of the present invention exemplifying the process of identifying and correcting an error in the metadata. Once a breakdown of errors and/or warnings is generated, the user may routinely navigate to an area that needs work, sorting, drilling down, to fix problems and investigate patterns. The method provides searching and sorting tools which provide the ability to locate and isolate issues within the metadata. The method further provides the ability to create metadata subsets to work on by browsing or searching, thus making it easy for the user to easily access the metadata that needs to be managed and/or corrected and/or improved or enhanced.

Further, the method provides a means to show potential problems to others, propose edits, and approve changes before committing them.

Using the method of the present invention, errors can be automatically located and recurring mistakes, errors and deficiencies in the data can be corrected, providing a means by which the entire database can be corrected and enhanced with speed and ease.

The corrected, edited and/or enhanced metadata can then be exported easily to trading partners via the method of the 25 present invention.

In one embodiment, the method of the present invention further comprises assignment of one or more scores based upon the comparison of the metadata reported to the user in, for example a scorecard. See step 106 of FIG. 1 . Various nonlimiting examples of scorecards are depicted in FIGS. 2A through 2C. Assigned scores may be based upon a number of factors including, but not limited to, the completeness and accuracy of the metadata, its compliance with industry standards, rules and/or best practices and/or inventor recommendations, suggestions and/or proposals and its richness, meaning the inclusion of additional metadata above and beyond what is required by the industry.

Another advantage of the present invention is that, at least for some users, this may be the first time that all of their metadata is maintained in one place where it can be referred to and referenced.

Also provided by the present invention is a scoring system for metadata. The score generated through the scoring system of the present invention provides a means by which the reliability and/or comprehensiveness of the metadata can be evaluated.

The system of the present invention comprises a computer processor for receiving metadata related to a selected discipline.

The system further comprises both a means for evaluating completeness of the metadata file, meaning that the required fields of the metadata file such as tag, schemes, etc. are completed, and a means for comparing the received metadata to one or more selected consumer or industry rules or standards and/or best practices for the selected discipline to identify errors previously only otherwise identified by visual review by experts with years of experience in the rules, standards and best practices regarding metadata of the selected discipline. One or more scores are then assigned based upon a number of factors including, but not limited to, the completeness and accuracy of the metadata, its compliance with industry standards, rules and/or best practices and its richness, meaning the inclusion of additional metadata above and beyond what is required by the industry. Nonlimiting examples of score cards are depicted in FIGS. 2A through 2C. Based upon the score or scores, the user can determine if correction and/or enhancement and/or improvement of the metadata is desired. Once data is corrected, enhanced and/or improved, the user may re-submit the metadata for comparison to receive a new score based upon the changes. Scores can be used internally as well as by trading partners of the metadata and other metadata recipients to evaluate the provider's products and information provided therein. 

1-6. (canceled)
 7. A method comprising: assessing, by at least one processor, a metadata file for completion of fields defined by a standard; evaluating the metadata file against evaluation criteria, wherein the evaluation criteria is a collection of algorithms for evaluating data in the fields for compliance with a plurality of rules of the standard, wherein the metadata file includes at least a first data in a first field and a second data in a second field, wherein the evaluation includes determining whether the second data complies with at least one of the plurality of rules, wherein the second data is dependent on the first data according to the evaluation criteria; calculating at least one score for the metadata file based upon the assessment of the completion of required fields and the evaluation of the metadata file against the evaluation criteria; and presenting, by the at least one processor, a report of results of the assessment and the evaluation in a graphical user interface, the report including the at least one score for the metadata file.
 8. The method of claim 7, further comprising: receiving, in the graphical user interface, an edit to the first data or the second data; and adjusting the at least one score for the metadata file to reflect the edit to the first data or the second data.
 9. The method of claim 7, further comprising: presenting a suggested improvement to the first data or the second data corresponding to one or more data errors in the report of the results of the assessment and the evaluation in the graphical user interface, wherein the suggested improvements include an explanation of the suggested improvement to the first data or the second data; receiving, in the graphical user interface, an edit to the first data or the second data.
 10. The method of claim 9, further comprising: automatically making changes to the first data or the second data according to the suggested improvements.
 11. The method of claim 7, further comprising: receiving a selection of one or more data errors in the report of the results of the evaluation in the graphical user interface; and presenting a subset of the one or more data errors resulting from the selection of the one or more data errors.
 12. The method of claim 7, further comprising; after the presenting the results of the assessment and the evaluation, receiving a selection of one or more data errors; applying an update to correct a selected error; adjusting the at least one score for the metadata file to reflect the correction of the selected error.
 13. The method of claim 12, wherein the applying the update to correct the selected error includes applying the update to a plurality of records having the selected error to correct the selected error across the plurality of records.
 14. The method of claim 7, further comprising: after the presenting the results of the assessment and the evaluation, receiving inputs to navigate, sort, or filter to identify a subset of errors.
 15. The method of claim 7, wherein the at least one score includes a completeness component as judged by a percentage of the fields that include the data, and a quality component as judged by the evaluation of the metadata file against the evaluation criteria.
 16. The method of claim 7, wherein the metadata file includes the data pertaining to a publication.
 17. A system comprising: a storage configured to store instructions; a processor configured to execute the instructions and cause the processor to: assess, by at least one processor, a metadata file for completion of fields defined by a standard; evaluate the metadata file against evaluation criteria, wherein the evaluation criteria is a collection of algorithms for evaluating data in the fields for compliance with a plurality of rules of the standard, wherein the metadata file includes at least a first data in a first field and a second data in a second field, wherein the evaluation includes determining whether the second data complies with at least one of the plurality of rules, wherein the second data is dependent on the first data according to the evaluation criteria; and calculate at least one score for the metadata file based upon the assessment of the completion of required fields and the evaluation of the metadata file against the evaluation criteria.
 18. The system of claim 17, wherein the processor is configured to execute the instructions and cause the processor to: receive, in the graphical user interface, an edit to the first data or the second data; and adjust the at least one score for the metadata file to reflect the edit to the first data or the second data.
 19. The system of claim 17, wherein the processor is configured to execute the instructions and cause the processor to: receive a selection of one or more data errors in the report of the results of the evaluation in the graphical user interface; and present a subset of the one or more data errors resulting from the selection of the one or more data errors.
 20. The system of claim 17, wherein the processor is configured to execute the instructions and cause the processor to: after the presentation of the results of the assessment and the evaluation, receive inputs to navigate, sort, or filter to identify a subset of errors.
 21. A non-transitory computer-readable medium comprising instructions, the instructions, when executed by a computing system, cause the computing system to: assess, by at least one processor, a metadata file for completion of fields defined by a standard; evaluate the metadata file against evaluation criteria, wherein the evaluation criteria is a collection of algorithms for evaluating data in the fields for compliance with a plurality of rules of the standard, wherein the metadata file includes at least a first data in a first field and a second data in a second field, wherein the evaluation includes determining whether the second data complies with at least one of the plurality of rules, wherein the second data is dependent on the first data according to the evaluation criteria; and calculate at least one score for the metadata file based upon the assessment of the completion of required fields and the evaluation of the metadata file against the evaluation criteria.
 22. The computer-readable medium of claim 21, wherein the computer-readable medium further comprises instructions that, when executed by the computing system, cause the computing system to: receive, in the graphical user interface, an edit to the first data or the second data; and adjust the at least one score for the metadata file to reflect the edit to the first data or the second data.
 23. The computer-readable medium of claim 21, wherein the computer-readable medium further comprises instructions that, when executed by the computing system, cause the computing system to: present a suggested improvement to the first data or the second data corresponding to one or more data errors in the report of the results of the assessment and the evaluation in the graphical user interface, wherein the suggested improvements include an explanation of the suggested improvement to the first data or the second data; receive, in the graphical user interface, an edit to the first data or the second data.
 24. The computer-readable medium of claim 21, wherein the computer-readable medium further comprises instructions that, when executed by the computing system, cause the computing system to: after the presenting the results of the assessment and the evaluation, receive a selection of one or more data errors; apply an update to correct a selected error; and adjust the at least one score for the metadata file to reflect the correction of the selected error.
 25. The computer-readable medium of claim 24, the applying the update to correct the selected error includes applying the update to a plurality of records having the selected error to correct the selected error across the plurality of records.
 26. The computer-readable medium of claim 21, wherein the computer-readable medium further comprises instructions that, when executed by the computing system, cause the computing system to: after the presentation of the results of the assessment and the evaluation, receive inputs to navigate, sort, or filter to identify a subset of errors. 